Skip to content

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Apr 14, 2025

No description provided.

Signed-off-by: Junwang Zhao <[email protected]>
@zhjwpku zhjwpku changed the title feat: snapshot ser/der feat: snapshot serde Apr 14, 2025
@zhjwpku zhjwpku requested a review from Copilot April 14, 2025 12:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

src/iceberg/json_internal.cc:574

  • Consider verifying that summary_json is an object before iterating over its items to avoid potential exceptions if the JSON structure does not match expectations.
for (const auto& [key, value] : summary_json.items()) {

src/iceberg/type_fwd.h:105

  • [nitpick] Changing TableMetadata from a class to a struct may affect encapsulation; please confirm that this change is intentional and consistent with its usage elsewhere.
struct TableMetadata;

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

}
}

Result<std::unique_ptr<Snapshot>> SnapshotFromJson(const nlohmann::json& json) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be consistent with the Java impl: https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotParser.java. Specifically, we need to deal with cases where sequence number or summary is missing.

@Fokko Will it actually happen that a snapshot does not have summary (and thus operation is also missing)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the spec, summary is required for v2 and v3 but optional for v1. So I believe the spec answers my question. We have to handle this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct!

  • For V1 the summary is optional.
  • For V2/V3 the summary is required, and also the operation. Some writers produced some malformed metadata in the past. Instead of throwing an exception, we would it is an overwrite operation, since that's the most generic one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we need to set operation to overwrite when summary is available but operation is missing. @zhjwpku

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, please take a look.

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Apr 15, 2025

I've updated the PR to use ToJson as the serialization function name, also only write the summary map when there is an operation, this is the same behavior from java impl[1]

[1] https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L69-L82

@Fokko Fokko merged commit c343e94 into apache:main Apr 15, 2025
6 checks passed
@Fokko
Copy link
Contributor

Fokko commented Apr 15, 2025

Thanks for working on this @zhjwpku, and thanks for the review @lidavidm, @yingcai-cy and @wgtmac 🙌

@zhjwpku zhjwpku deleted the snapshot_ser_der branch May 1, 2025 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants